ggplot2 demo: Tidy Tuesday

MATH 70076 - Data Science

Dr Zak Varty

Aim

  • Build skills with {ggplot2}
  • Show iterative development of a data visualisation


How will we do it?

Educational Attainment of Young People in English Towns

The dataset comes from The UK Office for National Statistics.

It was explored in the July 2023 article “Why do children and young people in smaller towns do better academically than those in larger towns?”.


Reproduce this plot from the article.

Template R script


2024-01-23_tidy-tuesday.R
# Tidy Tuesday 2024-01-23 English Education

# Load required packages -------------------------------------------------------

# Load data --------------------------------------------------------------------

# Data wrangling ---------------------------------------------------------------

# Create plot ------------------------------------------------------------------

# Save plot --------------------------------------------------------------------

# End of file

Planning

Planning


Before we get started coding it is a good idea to take some time with a pen and paper to work out exactly what data we need to construct this plot and what form we need it in.


This stage would usually come after some exploration of the data and a rough hand-sketch of the plot we want to make. Our job is a bit simpler in this case, because we are reconstructing an existing plot. This means we don’t have to make as many decisions of our own, instead we just have to pay careful attention to the choices made in the original plot.

03:00

town_scores

label_df and arrow_df

Strings and Colours

Data Wrangling

Load Required Packages and Data


2024-01-23_tidy-tuesday.R # Load required packages ----
# Load required packages -------------------------------------------------------
library(readr)              # loading data
library(dplyr)              # data frame manipulation
library(stringr)            # character string manipulation



2024-01-23_tidy-tuesday.R # Load data ----
# Load data --------------------------------------------------------------------
url <- 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-01-23/english_education.csv'
education <- readr::read_csv(file = url)

Making town_scores

Take a look at the structure of the data we have loaded and try to construct town_scores.


Console
colnames(education)
 [1] "town11cd"                                                         
 [2] "town11nm"                                                         
 [3] "population_2011"                                                  
 [4] "size_flag"                                                        
 [5] "rgn11nm"                                                          
 [6] "coastal"                                                          
 [7] "coastal_detailed"                                                 
 [8] "ttwa11cd"                                                         
 [9] "ttwa11nm"                                                         
[10] "ttwa_classification"                                              
[11] "job_density_flag"                                                 
[12] "income_flag"                                                      
[13] "university_flag"                                                  
[14] "level4qual_residents35_64_2011"                                   
[15] "ks4_2012_2013_counts"                                             
[16] "key_stage_2_attainment_school_year_2007_to_2008"                  
[17] "key_stage_4_attainment_school_year_2012_to_2013"                  
[18] "level_2_at_age_18"                                                
[19] "level_3_at_age_18"                                                
[20] "activity_at_age_19_full_time_higher_education"                    
[21] "activity_at_age_19_sustained_further_education"                   
[22] "activity_at_age_19_appprenticeships"                              
[23] "activity_at_age_19_employment_with_earnings_above_0"              
[24] "activity_at_age_19_employment_with_earnings_above_10_000"         
[25] "activity_at_age_19_out_of_work"                                   
[26] "highest_level_qualification_achieved_by_age_22_less_than_level_1" 
[27] "highest_level_qualification_achieved_by_age_22_level_1_to_level_2"
[28] "highest_level_qualification_achieved_by_age_22_level_3_to_level_5"
[29] "highest_level_qualification_achieved_by_age_22_level_6_or_above"  
[30] "highest_level_qualification_achieved_b_age_22_average_score"      
[31] "education_score"                                                  

What do these columns mean?


Referring to the data dictionary, it seems that the relevant columns are likely to be:

  • rgn11nm (character) the region name as recorded in 2011;
  • education_score (numeric) the town/city education score based on attainment levels of the 2012/13 Key stage 4 cohort;
  • income_flag (character) a variable used to describe towns as lower income deprivation, mid income deprivation or higher income deprivation.

Extract Rows on Towns

Our first step is to consider only the rows which correspond to towns. We can do this using the income_flag.


Console
unique(education$income_flag)
[1] "Higher deprivation towns" "Mid deprivation towns"   
[3] "Lower deprivation towns"  "Cities"                  
[5] NA                        

Extract Rows on Towns

We filter to only rows there the income_flag contains the string "town".

Console
education %>% 
  filter(str_detect(income_flag, pattern = "town"))
# A tibble: 1,082 × 31
   town11cd  town11nm population_2011 size_flag rgn11nm coastal coastal_detailed
   <chr>     <chr>              <dbl> <chr>     <chr>   <chr>   <chr>           
 1 E34000007 Carlton…            5456 Small To… East M… Non-co… Smaller non-coa…
 2 E34000016 Dorches…           19060 Small To… South … Non-co… Smaller non-coa…
 3 E34000020 Ely BUA            19090 Small To… East o… Non-co… Smaller non-coa…
 4 E34000026 Market …            6429 Small To… Yorksh… Non-co… Smaller non-coa…
 5 E34000027 Downham…           10884 Small To… East o… Non-co… Smaller non-coa…
 6 E34000039 Penrith…           15181 Small To… North … Non-co… Smaller non-coa…
 7 E34000048 Bolsove…           11754 Small To… East M… Non-co… Smaller non-coa…
 8 E34000055 March B…           21051 Medium T… East o… Non-co… Large non-coast…
 9 E34000056 Southam…            6567 Small To… West M… Non-co… Smaller non-coa…
10 E34000067 Royston…           15781 Small To… East o… Non-co… Smaller non-coa…
# ℹ 1,072 more rows
# ℹ 24 more variables: ttwa11cd <chr>, ttwa11nm <chr>,
#   ttwa_classification <chr>, job_density_flag <chr>, income_flag <chr>,
#   university_flag <chr>, level4qual_residents35_64_2011 <chr>,
#   ks4_2012_2013_counts <dbl>,
#   key_stage_2_attainment_school_year_2007_to_2008 <dbl>,
#   key_stage_4_attainment_school_year_2012_to_2013 <dbl>, …

Tidying up

We next provide more meaningful names, format the columns we will keep and drop all irrelevant columns.

Console
education %>% 
  filter(str_detect(income_flag, pattern = "town")) %>% 
  mutate(region = as.factor(rgn11nm), income = as.factor(income_flag)) %>%
  select(region, income, education_score)
# A tibble: 1,082 × 3
   region                   income                   education_score
   <fct>                    <fct>                              <dbl>
 1 East Midlands            Higher deprivation towns          -0.534
 2 South West               Mid deprivation towns              1.95 
 3 East of England          Lower deprivation towns           -1.04 
 4 Yorkshire and The Humber Lower deprivation towns           -1.25 
 5 East of England          Higher deprivation towns          -1.17 
 6 North West               Lower deprivation towns            0.845
 7 East Midlands            Higher deprivation towns          -3.72 
 8 East of England          Higher deprivation towns          -2.17 
 9 West Midlands            Lower deprivation towns            6.44 
10 East of England          Lower deprivation towns            0.295
# ℹ 1,072 more rows

Calculate Group Summaries

To calculate our average education score in each region-income combination, we have to group our observations by both of these factors.

We can then calculate a the required summary statistic for each group.


Console
education %>% 
  filter(str_detect(income_flag, pattern = "town")) %>% 
  mutate(region = as.factor(rgn11nm), income = as.factor(income_flag)) %>%
  select(region, income, education_score) %>% 
  group_by(region, income) %>% 
  summarise(mean_education_score = mean(education_score))

Calculate Group Summaries

# A tibble: 24 × 3
# Groups:   region [8]
   region          income                   mean_education_score
   <fct>           <fct>                                   <dbl>
 1 East Midlands   Higher deprivation towns                -2.96
 2 East Midlands   Lower deprivation towns                  2.23
 3 East Midlands   Mid deprivation towns                   -1.18
 4 East of England Higher deprivation towns                -3.35
 5 East of England Lower deprivation towns                  2.60
 6 East of England Mid deprivation towns                   -1.92
 7 North East      Higher deprivation towns                -2.14
 8 North East      Lower deprivation towns                  4.63
 9 North East      Mid deprivation towns                    1.42
10 North West      Higher deprivation towns                -1.30
# ℹ 14 more rows

Ungrouping

summarise() has grouped output by ‘region’. You can override using the .groups argument.

This warning reminds us that this summarise() call returns a tibble that is still grouped by region. To avoid any mistakes later, let’s undo this grouping.


Console
education %>% 
  filter(str_detect(income_flag, pattern = "town")) %>% 
  mutate(region = as.factor(rgn11nm), income = as.factor(income_flag)) %>%
  select(region, income, education_score) %>% 
  group_by(region, income) %>% 
  summarise(mean_education_score = mean(education_score)) %>% 
  ungroup()

Storing town_scores



2024-01-23_tidy-tuesday.R # Data Wrangling ----
town_scores <- education %>% 
  filter(str_detect(income_flag, pattern = "town")) %>% 
  mutate(region = as.factor(rgn11nm), income = as.factor(income_flag)) %>%
  select(region, income, education_score) %>% 
  group_by(region, income) %>% 
  summarise(mean_education_score = mean(education_score)) %>% 
  ungroup()

Making label_df

The first column in labels_df should be the mean education scores for the North West region. We can easily extract these from town_scores, which we just created.

Console
town_scores %>% 
  filter(region == "North West") %>% 
  select(mean_education_score)
# A tibble: 3 × 1
  mean_education_score
                 <dbl>
1                -1.30
2                 5.00
3                 1.72

The issue here is that we still have a tibble. If we want to extract a column as a numeric vector then we can use pull instead of select.

Getting NW_scores


Console
NW_scores <- town_scores %>% 
  filter(region == "North West") %>% 
  pull(mean_education_score)

NW_scores
[1] -1.302181  5.004601  1.721192


Console
label_df <- tibble(
  horizontal = NW_scores,
  vertical = rep(9, 3),
  string = c("Higher income deprivation", "Mid", "Lower")
)

label_df
# A tibble: 3 × 3
  horizontal vertical string                   
       <dbl>    <dbl> <chr>                    
1      -1.30        9 Higher income deprivation
2       5.00        9 Mid                      
3       1.72        9 Lower                    

Making arrow_df

We can similarly construct arrow_df.


Console
arrow_df <- tibble(
  horizontal = NW_scores,
  upper = rep(8.75, 3),
  lower = rep(8.25, 3)
)

arrow_df
# A tibble: 3 × 3
  horizontal upper lower
       <dbl> <dbl> <dbl>
1      -1.30  8.75  8.25
2       5.00  8.75  8.25
3       1.72  8.75  8.25

Adding to script


2024-01-23_tidy-tuesday.R # Data Wrangling ----
town_scores <- education %>% 
  filter(str_detect(income_flag, pattern = "town")) %>% 
  mutate(region = as.factor(rgn11nm), income = as.factor(income_flag)) %>%
  select(region, income, education_score) %>% 
  group_by(region, income) %>% 
  summarise(mean_education_score = mean(education_score)) %>% 
  ungroup()

NW_scores <- town_scores %>% 
  filter(region == "North West") %>% 
  pull(mean_education_score)

label_df <- tibble(
  horizontal = NW_scores,
  vertical = rep(9, 3),
  string = c("Higher income deprivation", "Lower", "Mid")
)

arrow_df <- tibble(
  horizontal = NW_scores,
  upper = rep(8.75, 3),
  lower = rep(8.25, 3)
)

Strings and Colour Palette

The final step in our data preparations is to create some variables that we can use later to access the colours and texts strings that we will use when creating the plot.


2024-01-23_tidy-tuesday.R # Data Wrangling ----
# Note added line breaks (\n) to match original formatting
strings <- list(
  title = "Towns in the North West have the highest attainment scores at
  all  \n income deprivation levels",
  sub = "\n  \n  Average educational attainment score for towns, by region
  and income  \n  deprivation level, England",
  cap = "Source: Office for National Statistics analysis using Longitudinal
  Education Outcomes (LEO)  \n from the Department for Education (DfE) and Index
  of Multiple Deprivation 2019 from the  \n Department for Levelling Up, Housing
  and Communities (DLUHC)"
)

ONS_palette <- list(
  red = "#7c265a",
  blue = "#315f91",
  light_grey = "#d3d3d3",
  dark_grey = "#8A8A8A",
  black = "#222222",
  white = "#ffffff")

Plotting

Loading {ggplot2}

We will use {ggplot2} to create our visualisation, so we have to add this to our library calls.

2024-01-23_tidy-tuesday.R # Load Packages ----
library(ggplot2)

Basic Plot

Using geom_point() create a scatter plot. We want to use town_scores as the source of our data, mapping: the mean_score column to the x location of each point, the region column to the y location of each point, the income column to determine the colour of each point.

ggplot() + 
  geom_point(
    data = town_scores,
    mapping = aes(
      x = mean_education_score,
      y = region,
      colour = income)
  )

Fix Plotting Character

Switch to solid circles, plotting character 21.

ggplot() + 
  geom_point(
    data = town_scores,
    mapping = aes(
      x = mean_education_score,
      y = region,
      colour = income),
    shape = 21
  )

Use fill for hollow shapes

ggplot() + 
  geom_point(
    data = town_scores,
    mapping = aes(
      x = mean_education_score,
      y = region,
      fill = income),
    shape = 21
  )

Add vertical line at x = 0

We can add another layer to our plot, to thicken the vertical line \(x=0\).

ggplot() + 
  geom_point(
    data = town_scores,
    mapping = aes(
      x = mean_education_score,
      y = region,
      fill = income),
    shape = 21) + 
  geom_vline(
    xintercept = 0,
    linewidth = 0.8,
    colour = ONS_palette$dark_grey)

Remove grey background with theme_minimal()

# Structure
p <- ggplot() + 
  geom_point(
    data = town_scores,
    mapping = aes(
      x = mean_education_score,
      y = region,
      fill = income),
    shape = 21) + 
  geom_vline(
    xintercept = 0,
    linewidth = 0.8,
    colour = ONS_palette$dark_grey) 

# Cosmetics
p + theme_minimal()

Remove Legend and Axis Labels


p + 
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank()
  )

Fix grid lines

We can use theme() to correct other cosmetic problems, such as removing the minor vertical grid lines and making the horizontal grid lines dashed.


p + 
  theme_minimal() + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.y = element_line(linetype = 2)
  )

Adjust plotting area with coord_cartesian()

# Structure
p <- p + 
  coord_cartesian(xlim = c(-4, 6), ylim = c(1.25,8.75))

# Cosmetics
p + 
  theme_minimal() + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.y = element_line(linetype = 2)
  )


This has fixed one problem but created another. Now our grid lines extend up too far. Let’s replace them with line segments using geom_linerange().

Remove vertical grid lines

Remove the gridlines using theme() and define lines we want to draw.

p + 
  theme_minimal() + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_line(linetype = 2)
  )

vertical_grid_df <- tibble(
  x = seq(-4, 6, by = 2),
  ymin = rep(0, length(x)),
  ymax = rep(8.25, length(x))
)

Add a custom vertical grid

Add those lines that we just defined. Beautiful!

p <- ggplot() + 
  geom_point(
    data = town_scores,
    mapping = aes(
      x = mean_education_score,
      y = region,
      fill = income),
    shape = 21,
  ) + 
  geom_linerange(
    mapping = aes(x = 0, ymin = 0.25, ymax = 8.25),
    colour = ONS_palette$dark_grey,
    linewidth = 0.8) +
  geom_linerange(
    data = vertical_grid_df,
    mapping = aes(x = x, ymin = ymin, ymax = ymax),
    colour = ONS_palette$light_grey,
    linewidth = 0.3) +
  coord_cartesian(xlim = c(-4, 6), ylim = c(1.25,8.75))

p +   
  theme_minimal() + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_line(linetype = 2)
  )

Adding Strings

Now let’s add text to the plot. That looks disgusting.

p <- p +
  labs(title = strings$title,
       subtitle = strings$sub,
       caption = strings$cap)

p +   
  theme_minimal() + 
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_line(linetype = 2)
  )

Left align and resize using {ggtext}

library(ggtext) # for element_markdown()


theme_ONS <- theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_line(linetype = 2),
    plot.title = element_markdown(
      size = 15,
      hjust = 0,
      face = "bold",
      margin = margin(0,0,15,0)),
    plot.title.position = "plot",
    plot.subtitle = element_markdown(
      size = 14,
      hjust = 0,
      margin = margin(0,0,15,0)),
    plot.caption = element_markdown(
      size = 12,
      hjust = 0,
      color = ONS_palette$dark_grey),
    plot.caption.position = "plot")

p +   
  theme_minimal() + 
  theme_ONS

Putting places in the correct order

To get the locations arranged with Northern places at the top and Southern places at the bottom, we need to re-level the location factor in town_scores.

library(forcats)

town_scores <- town_scores %>% 
mutate(region = fct_relevel(region,
  "South West",
  "East of England",
  "South East",
  "East Midlands",
  "West Midlands",
  "Yorkshire and The Humber",
  "North East",
  "North West"))

p <- ggplot() + 
  geom_point(
    data = town_scores,
    mapping = aes(
      x = mean_education_score,
      y = region,
      fill = income),
    shape = 21) + 
  geom_linerange(
    mapping = aes(x = 0, ymin = 0.25, ymax = 8.25),
    colour = ONS_palette$dark_grey,
    linewidth = 0.8) +
  geom_linerange(
    data = vertical_grid_df,
    mapping = aes(x = x, ymin = ymin, ymax = ymax),
    colour = ONS_palette$light_grey,
    linewidth = 0.3) +
  coord_cartesian(xlim = c(-4, 6), ylim = c(1.25,8.75)) + 
  labs(title = strings$title, subtitle = strings$sub, caption = strings$cap)

p + theme_minimal() + theme_ONS

Split Yourkshire and The Humber

Add a line break into the third level of the location factor.

town_scores <- town_scores %>% 
  mutate(region = fct_recode(region,
  "Yorkshire and\n The Humber" = "Yorkshire and The Humber"))

p <- ggplot() + 
  geom_point(
    data = town_scores,
    mapping = aes(
      x = mean_education_score,
      y = region,
      fill = income),
    shape = 21,
  ) + 
  geom_linerange(
    mapping = aes(x = 0, ymin = 0.25, ymax = 8.25),
    colour = ONS_palette$dark_grey,
    linewidth = 0.8) +
  geom_linerange(
    data = vertical_grid_df,
    mapping = aes(x = x, ymin = ymin, ymax = ymax),
    colour = ONS_palette$light_grey,
    linewidth = 0.3) +
  coord_cartesian(xlim = c(-4, 6), ylim = c(1.25,8.75)) + 
  labs(title = strings$title, subtitle = strings$sub, caption = strings$cap)

p + theme_minimal() + theme_ONS

Match the ONS colour scheme

Make the plotting characters larger and switch to a custom ONS colour scheme.


p <- ggplot() + 
  geom_point(
    data = town_scores,
    mapping = aes(
      x = mean_education_score,
      y = region,
      fill = income),
    shape = 21,
    size = 4) + 
  geom_linerange(
    mapping = aes(x = 0, ymin = 0.25, ymax = 8.25),
    colour = ONS_palette$dark_grey,
    linewidth = 0.8) +
  geom_linerange(
    data = vertical_grid_df,
    mapping = aes(x = x, ymin = ymin, ymax = ymax),
    colour = ONS_palette$light_grey,
    linewidth = 0.3) +
  coord_cartesian(xlim = c(-4, 6), ylim = c(1.25,8.75)) + 
  labs(title = strings$title, subtitle = strings$sub, caption = strings$cap) + 
  scale_fill_manual(values = unname(ONS_palette))

p + theme_minimal() + theme_ONS

Add Labels

We can add a layer of text annotations to our plot using geom_text().

p + 
  geom_text(
    data = label_df,
    mapping = aes(
      x = horizontal,
      y = vertical, 
      label = string),
    color = ONS_palette$black) +
  theme_minimal() +
  theme_ONS

Add Arrows

Similarly, geom_segment() can be adapted to draw arrows. To avoid these overlapping with our text, we also have to nudge our labels up a little bit.

p <- p + 
  geom_text(
    data = label_df,
    mapping = aes(
      x = horizontal, 
      y = vertical, 
      label = string),
    color = ONS_palette$black) +
  geom_segment(
    data = arrow_df,
    mapping = aes(
      x = horizontal, 
      xend = horizontal,
      y = upper,
      yend = lower),
    arrow = arrow(length = unit(0.1, "inches"))
    )

p + theme_minimal() + theme_ONS

Pulling it all together


2024-01-23_tidy-tuesday.R
# Tidy Tuesday 2024-01-23 English Education

# Load required packages -------------------------------------------------------
library(readr)                  # loading data
library(dplyr)                  # data frame manipulation
library(stringr)                # character string manipulation
library(ggplot2)                # plotting
library(ggtext)                 # element_markdown for subtitle formatting
library(forcats)                # working with factors

# Load data --------------------------------------------------------------------
url <- 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-01-23/english_education.csv'
education <- readr::read_csv(file = url)

# Data wrangling ---------------------------------------------------------------

## Main plot ---------------------------------

town_scores <- education %>%
  filter(str_detect(income_flag, pattern = "town")) %>%
  mutate(region = as.factor(rgn11nm), income = as.factor(income_flag)) %>%
  select(region, income, education_score) %>%
  group_by(region, income) %>%
  summarise(mean_education_score = mean(education_score)) %>%
  ungroup()  %>%
mutate(region = fct_relevel(region,
  "South West",
  "East of England",
  "South East",
  "East Midlands",
  "West Midlands",
  "Yorkshire and The Humber",
  "North East",
  "North West"))%>%
  mutate(region = fct_relevel(
    region,
    "South West",
    "East of England",
    "South East",
    "East Midlands",
    "West Midlands",
    "Yorkshire and The Humber",
    "North East",
    "North West")) %>%
  mutate(region = fct_recode(region,
    "Yorkshire and\n The Humber" = "Yorkshire and The Humber"))



## Annotations ---------------------------------

NW_scores <- town_scores %>%
  filter(region == "North West") %>%
  pull(mean_education_score)

label_df <- tibble(
  horizontal = NW_scores,
  vertical = rep(9, 3),
  string = c("Higher income deprivation", "Lower", "Mid")
)

arrow_df <- tibble(
  horizontal = NW_scores,
  upper = rep(8.75, 3),
  lower = rep(8.25, 3)
)

strings <- list(
  title = "Towns in the North West have the highest attainment scores at
  all  \n income deprivation levels",
  sub = "\n  \n  Average educational attainment score for towns, by region
  and income  \n  deprivation level, England",
  cap = "Source: Office for National Statistics analysis using Longitudinal
  Education Outcomes (LEO)  \n from the Department for Education (DfE) and Index
  of Multiple Deprivation 2019 from the  \n Department for Levelling Up, Housing
  and Communities (DLUHC)"
)

ONS_palette <- list(
  red = "#7c265a",
  blue = "#315f91",
  light_grey = "#d3d3d3",
  dark_grey = "#8A8A8A",
  black = "#222222",
  white = "#ffffff")

vertical_grid_df <- tibble(
  x = seq(-4, 6, by = 2),
  ymin = rep(0, length(x)),
  ymax = rep(8.25, length(x))
)

# Create plot ------------------------------------------------------------------

p <- ggplot() +
  geom_point(
    data = town_scores,
    mapping = aes(
      x = mean_education_score,
      y = region,
      fill = income),
    shape = 21,
    size = 4) +
  geom_linerange(
    mapping = aes(x = 0, ymin = 0.25, ymax = 8.25),
    colour = ONS_palette$dark_grey,
    linewidth = 0.8) +
  geom_linerange(
    data = vertical_grid_df,
    mapping = aes(x = x, ymin = ymin, ymax = ymax),
    colour = ONS_palette$light_grey,
    linewidth = 0.3) +
  coord_cartesian(xlim = c(-4, 6), ylim = c(1.25,8.75)) +
  labs(title = strings$title, subtitle = strings$sub, caption = strings$cap) +
  scale_fill_manual(values = unname(ONS_palette)) +
  geom_text(
    data = label_df,
    mapping = aes(
      x = horizontal,
      y = vertical,
      label = string),
    color = ONS_palette$black) +
  geom_segment(
    data = arrow_df,
    mapping = aes(
      x = horizontal,
      xend = horizontal,
      y = upper,
      yend = lower),
    arrow = arrow(length = unit(0.1, "inches"))) +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    axis.text = element_text(size = 12, colour = ONS_palette$black),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_line(linetype = 2),
    plot.title = element_markdown(size = 15,
                                  hjust = 0,
                                  face = "bold",
                                  margin = margin(0,0,15,0)),
    plot.title.position = "plot",
    plot.subtitle = element_markdown(size = 14,
                                     hjust = 0,
                                     margin = margin(0,0,15,0)),
    plot.caption = element_markdown(size = 12,
                                    hjust = 0,
                                    color = ONS_palette$dark_grey),
    plot.caption.position = "plot")


# Save plot --------------------------------------------------------------------

out_path <- "2024-01-23-english-education"

ggsave(
  plot = p,
  filename = paste0(out_path, ".png"),
  device = "png",
  width = 7.5,
  height = 6.5,
  units = "in",
  dpi = 300,
  bg = ONS_palette$white
)

# End of file

Final Product